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Abstract 

In the context of inference with expectation constraints, we propose 
an approach based on the "loopy belief propagation" algorithm (lpb), as 
a surrogate to an exact Markov Random Field (mrf) modelling. A prior 
information composed of correlations among a large set of N variables, is 
encoded into a graphical model; this encoding is optimized with respect 
to an approximate decoding procedure (lbp), which is used to infer hid- 
den variables from an observed subset. We focus on the situation where 
the underlying data have many different statistical components, repre- 
senting a variety of independent patterns. Considering a single parameter 
family of models we show how lpb may be used to encode and decode 
efficiently such information, without solving the NP hard inverse prob- 
lem yielding the optimal mrf. Contrary to usual practice, we work in 
the non-convex Bethe free energy minimization framework, and manage 
to associate a belief propagation fixed point to each component of the 
underlying probabilistic mixture. The mean field limit is considered and 
yields an exact connection with the Hopfield model at finite temperature 
and steady state, when the number of mixture components is proportional 
to the number of variables. In addition, we provide an enhanced learning 
procedure, based on a straightforward multi-parameter extension of the 
model in conjunction with an effective continuous optimization procedure. 
This is performed using the stochastic search heuristic CMAES and yields a 
significant improvement with respect to the single parameter basic model. 



1 Introduction 

Prediction or recognition methods on systems in a random environment have 
somehow to exploit regularities or correlations, possibly both spatial and tem- 
poral, to infer a global behavior from partial observations. For example, on a 
road-traffic network, one is interested to extract, from fixed sensors and floating 
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car data, an estimation of the overall traffic situation and its evolution pQ . For 
image recognition or visual event detection, it is in some sense the mutual infor- 
mation between different pixels or sets of pixels that one wishes to exploit. The 
natural probabilistic tool to encode mutual information is the Markov Random 
Field (mrf), which marginal conditional probabilities have to be computed for 
the prediction or recognition process. 

The inference problem (with expectation constraints [2]) that we want to 
address is stated as follows: the system is composed of discrete variables x = 
{xi, i E V} € {1, . . . , q} v for which the only known statistical information is 
in the form of marginal probabilities, p a (x a ) on a set T of cliques a C V. 
Such marginals are typically the result of some empirical procedure producing 
historical data. Based on this historical information, consider then a situation 
where some of the variables are observed, say a subset x* = {x*, i € V*}, while 
the other one, the complementary set V \ V*, remains hidden. What prediction 
can be made concerning this complementary set, and how fast can we make this 
prediction, if we think in terms of real time applications, like traffic prediction 
for example? 

Since the variables take their values over a finite set, the marginal probabili- 
ties are fully described by a finite set of correlations and, following the principle 
of maximum entropy distribution of Jaynes [3] , we expect the historical data to 
be best encoded in a MRF with a joint probability distribution of x of the form 

p(x)=ji&o*o n>°( x °)- c 1 ) 

ieV aGF 

This representation corresponds to a factor graph [4] , where by convenience we 
associate a function <fii(xi) to each variable i € V in addition to the subsets 
a € that we call "factors". T together with V define the factor graph Q, 
which will be assumed to be connected. 
There are two main issues: 

• inverse problem: how to set the parameters of ([lj in order to fulfill the 
constraints imposed by the historical data? 

• inference: how to decode (in the sense of computing marginals) in the 
most efficient manner — typically in real time — this information, in terms 
of conditional probabilities P(x|x*)? 

Exact procedures generally face an exponential complexity problem both for 
the encoding and decoding procedures and one has to resort to approximate 
procedures [5] . The Bethe approximation [6] , which is used in statistical physics 
consists in minimizing an approximate version of the variational free energy 
associated to (TT]). In computer science, the belief propagation BP algorithm [7j 
is a message passing procedure that allows to compute efficiently exact marginal 
probabilities when the underlying graph is a tree. When the graph has cycles, 
it is still possible to apply the procedure (then referred to as lbp, for "loopy 
belief propagation"), which converges with a rather good accuracy on sufficiently 
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sparse graphs. However, there may be several fixed points, either stable or 
unstable. It has been shown that these points coincide with stationary points 
of the Bethe free energy [5j which is defined as follows: 

F ( b ) = - ^2 b a( x a) ij a (x a ) - bi(Xi) log (j>i(Xi) 

+ ^TJ2b a (x a )\ogb a (x a )+^2^2(l- di)bi(xi) logbi(xi). (2) 

aef x a ieV Xi 

In addition, stable fixed points of LBP are local minima of the Bethe free en- 
ergy [9]. The question of convergence of lbp has been addressed in a series of 
works QUI EU [H] establishing conditions and bounds on the MRF coefficients for 
having global convergence. In the present work, we reverse the viewpoint. Since 
the decoding procedure is performed with lbp, presumably the best encoding 
of the historical data is the one for which lbp's output is p a in absence of "real 
time" information, that is when all the variables remain hidden (V* = 0). This 
has actually been proposed in [TJ], where it is proved in a specific case, that 
working with the "wrong" model, i.e. the message passing approximate version, 
yields better results from the decoding viewpoint. We will come back on this 
later in Section [51 when we will compare various possible approximate models 
within this framework. In this paper, we propose a new approach, based on 
multiple fixed points of lbp identification, able to deal both with the encoding 
and decoding procedure in a consistent way, suitable for real time applications. 
The paper is organized as follows: our inference strategy is detailed in Sec- 
tion [21 in Section O we specify the problem to the inference of binary variables 
which distribution follows a mixture of product forms and present some numer- 
ical results; these are analyzed in Section [4] in the light of some scaling limits 
where mean field equations become relevant, allowing for a direct connection 
with the Hopfield model. In Section [5] we propose a multi-parameter extension 
of the model well suited to a continuous optimization, which allows to enhance 
the performance of the model. Finally we conclude in Section [5] by compar- 
ing our approach with other variant of lbp and giving perspective for future 
developments. 

2 LBP inference with marginal constraints 
2.1 The belief propagation algorithm 

The belief propagation algorithm [7] is a message passing procedure, with a joint 
probability measure like ([T]) as input, and which output is a set of estimated 
marginal probabilities, the beliefs 6 a (x a ) (including single nodes beliefs bi(xi)). 
The idea is to factor the marginal probability at a given site as a product of 
contributions coming from neighboring factor nodes, which are the messages. 
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With our definition of the joint probability measure, the updates rules read: 



m a -n(xi) <— V>g(x g ) Y[ n j->a(xj), (3) 

j£a\i 

ni^ a (xi) = <pi(xi) Yl m a '^i(xi), (4) 

a' Bi,a' 

where the notation ^ x should be understood as summing all the variables Xi, 
i € s C V, from 1 to q. When the algorithm converges, the resulting beliefs are 

bi{%i) = -^-(f>i(xi)Y\_m a ^i[xi), (5) 

a3i 

& a (x Q ) = -^-^ (x Q ) JJr^^a(iCi), (6) 

where Zi and Z a are the corresponding normalization constants that make these 
beliefs sum to 1. These constants reduce to 1 when Q is a tree. In practice, the 
messages are normalized to have 

^ m a ^i(xi) = 1. (7) 

Xi = l 

A simple computation shows that equations ([5]) and © are compatible, since 
©-(H) imply that 

&a(*a) = b i(Xi)- (8) 

We can already address the inference issue of the introduction: inferring 
the law of all variables from the set V* of variables on which data is known is 
equivalent to evaluating the conditional probability 

P(Zi,X*) 

n^ix ; p(x , } • 

LBP is adapted to this case if a specific rule is defined for known variables 
i e V*: since the value of x* is known, there is no need to sum over possible 
values and (@| becomes 

n .^( Xi ) d = f J^te) Ila'3i,aV" "V-ifc), if « ^ V* or ij = ^, ^ 
[0, otherwise. 

2.2 Setting the model with LBP 

Fixed points of lbp algorithm yield only approximate marginal probabilities of 
P(x) when all the functions ij) a and <pi are known and considered as an input. 
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Conversely, assume that a set of marginal distributions {p} is given such that, 
for all a G T and i € a, 

^2 p a (x-a) = Pi(xi) and }]pi(xi) = 1. 

Finding the set of {ip a } and {(pi} such that the marginals of the joint proba- 
bility (H|) match {p} is a difficult inverse problem. Instead if we impose that 
the approximation via lbp of these marginals matches {p}, we face a much 
simpler problem: owing to its reparametrization property |14j . LBP can provide 
good candidates for tp a and <pi that admit a fixed point where 6 a (x a ) = p a (x a ), 
Va G T , and therefore bi(xi) = pi(xi), Vi G V. 

We look for a fixed point that satisfies ©-((11) in addition to this constraint. 
Normalization constants introduced in I©-© play no role in the present dis- 
cussion so we ignore them here. Using (JS])-® to rewrite ((TJ), one sees that the 
knowledge of one set of beliefs is sufficient to determine the underlying mrf 
uniquely: 

p(x) = n hfa) n M*a) = n w n n M *l. r 

It is therefore tempting to choose the functions appearing in ([T]) as follows. 

4>i(Xi) =Pi(Xi), i>a (Xo) = T[ ^ a ' yXa - > y (fO) 



This leads to the following formulation for the BP algorithm 



(11) 

which obviously admits m a ^i{xi) = 1 as a fixed point, and leads to the beliefs 

6(xo) = p(x a ) Va G T and b(x{) = p(xi) Vi G V, (12) 

This choice of functions (jTUJ) may seem arbitrary at first sight. It has how- 
ever already been proposed in (T3] and, in a slightly different problem of ML 
estimation, in [15] . Moreover, the following proposition shows that any other 
choice of ij> and <p is actually equivalent: 

Proposition 2.1. Any given set of functions ip and (f> such that LBP yields 
the prescribed fixed point provides exactly the same set of fixed points, 

including their stability properties, as tp and (f> would. 

Proof. Assume that there exists a set of messages m° which is a fixed point of 
LBP and such that 

pi(xi) = (t> i {x. l )Y[m° a ^ i (x l ). 

aBi 
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Then it is possible to express <j> and ip as 



and relations ©-([I]) rewrite 

_ (pjjXj) -p-r m al ^i(Xi) 

Therefore, ma^ifiij/m^^j;,) stands for the set of fixed point messages that 
would have been obtained with functions ip and </>, and the two versions of the 
algorithm are equivalent. ■ 



2.3 Controlling the strength of the interaction 

The structure of the factor graph on which LBP is supposed to be run is more 
or less imposed by the data. For example, if mutual information is given for 
each pair of variables, we then have a complete pairwise factor graph. In that 
case, LBP, which is well adapted to sparse graphs, will overestimate the mutual 
information between variables. To overcome this flaw, we introduce a single real 
parameter a > 0, to be roughly interpreted as an inverse temperature, which 
purpose is to moderate (or possibly amplify) the interaction between variables 
when the connectivity gets large. This is done through a geometric mean with 
the independent case, by replacing p a with Pa(Ylic a Pi)^ ■ The model (flUt 
is then rewritten as 

M*) = PMl M* a ) = ( J a{ * a) ( X (is) 

This definition allows to interpolate between a situation with strong interaction 
(a ^S> 1) and a situation with weak interactions (a ~ 0). Note that for 1, p is 
not anymore a predefined fixed point of the LBP scheme. However, Section [3] will 
show that (fT3")l does yield consistent results. In fact a quite similar deformation 
of the model has been proposed in 16], which we discuss later in Section [5J 

A related approach would have been to replace p a with (3p a + (1 — /?) Jliea Pi> 
this would preserve the single variables beliefs, without however affecting the 
results we present in a sensible way. Note that this is actually equivalent to 
replacing tp a by (3ipa + (1-0). 

Finally, an optimization with respect to the graph structure could be done 
afterwards, but we won't explore this possibility in the present work. Instead we 
will focus in Section [5] on the possibility to associate various parameter values to 
different types of edges, and to perform an optimization procedure with respect 
to these parameters. 
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3 Inferring a hidden mixture of product forms 



3.1 Experimental setting 

To test the ideas developed in the previous section, we assume a hidden mixture 
model on a set V of variables with cardinality N of the form 

c=i iev 

where x = {xi,i £ V} is a sequence of binary variables (xi £ {0, 1}), C is the 
number of components of the mixture which are superimposed, and pf (•) is the 
single site marginal corresponding to variable i for model c. The main virtue of 
this simplified testbed is that the performance of the approach we propose can 
be easily compared with theoretical bounds. 

In order to apply our inference method, we assume that the distribution (|14[) 
is unknown as well as the number C itself. The input of the algorithm is the 
set of 1- and 2- variables frequency statistics Pi(xi) and Pij(xi,Xj). Part of the 
freedom in choosing a lbp model is in the graph design. While the available 
data dictates a pairwise factor graph (each factor node is connected at most 
to two variables), it is still possible to choose which pairs of variables will be 
connected. To this end, we apply a simple pruning procedure, by selecting the 
links for which the quantity (to be interpreted in Section[4|) 



log ^(l,l)j%(0,0) 



^•(0,1)^(1,0) 



> e, 



where e is some positive threshold. We denote by K the mean connectivity of 
the resulting graph. 

Although fTJ| is quite general, the tests are conducted with C -C 2^, in the 
limit were the optimal sequences x c ' opt of each component c (i.e. with highest 
probability weight in the restricted distribution) have mutual Hamming distance 
of order N/2. The single sites probabilities pi — pf(l), corresponding to each 
component c, are generated randomly as i.i.d. variables, 

Pi = -(1+tanh/i?) 

with h\ uniformly distributed in some fixed interval [— h max , +h max \. The mean 
of p\ is therefore 1/2 and its variance reads 

v= i£ h (tanh 2 (/i)) £ [0,1/4]. 

This parameter v implicitly fixed by h rnax fixes the average level of "polarizabil- 
ity" of the variables in each cluster: v = corresponds topi = 1/2 while v = 1/4 
corresponds to pi £ {0, 1} with equal probability. The optimal configuration for 
each component is given by 



{pf>0.5}- 
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After fixing N and C, we randomly generate a set {p?, i£V,f<c<C}fora 
given value of v. The pruning of the graph is performed to reach a prescribed 
average connectivity K. Then two types of experiments are performed: 

• BP fixed points search, with the help of an evanescent guiding field 
h t -^t^oo 0: if t is the iteration step, we bias the LBP updates (0]) in the 
direction of one of the patterns by replacing <f>i(xi) by 

so that if there is a belief propagation fixed point correlated to the pattern 
p c , the field ht, which decays geometrically, helps to find the corresponding 
attractor. The corresponding set of beliefs b c which is obtained is then 
compared to p c . 

• decimation: Sequences x c are sampled for each component c of (fTi|) , and 

the decoding algorithm is tested successively (with no guiding field) after 
gradually revealing the elements of the sequence in a random order, and p 
denotes the fraction of observed variables. To each x c and p, the output 
is again a set of beliefs b c for the hidden variables to be compared with 
the exact conditional marginals extracted from |Tj 



The following indicators are used to assess the prediction success rate (R), the 
belief error (E) and the Kullback-Leibler error (X?kl) °f the algorithm when 
the values j £ V*} are known 

daf 1 1 C 

R = C \y\ y*| E E 1 mi)>o-5} x i + 1 {6=(i)<o.5}(l-a;i), 

1 \ 1 c =l ieV\V* x£{0,l} 



h\vW\^ E E ^^ l0g P rcf (a:!= X i|x^) - 

1 X 1 c=l i£V\V* xG{0,l} eU ' 1 V ' 



where P re f(xi|x*) is the conditional distribution of xi once a certain number of 
variables x* have been fixed, computed exactly from the hidden model (fT4]) . 
R is to be compared with the following expected success rate, which would be 
obtained by making use of the hidden underlying model, 

dcf 1 1 C 

d= " c rvTv*I E E 1 {p r cf( 2;i |x^,)>o.5}< + i{p rrf ( a;s |x-»)<o.5}(i-0- 

3.2 Preliminary Observations 

To assess this approach, we look first at the quality of the encoding (Figure [1]), 
by studying the nature of the fixed points when all the variables are hidden. 
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Figure 1: Proportion of actual fixed points (circles) found by LBP, probability 
of convergence toward a spurious fixed point (squares) from a random initial- 
ization, and number of different spurious fixed points divided by the number of 
runs (100) (left: C/K = 0.52, right C/K = 0.13). 



N=1000 C=20 v=0.15 <K>=160 N-1000 C=20v=0.15 <K>=160 




Figure 2: Influence of a on the inference success rate R (left) and on the belief 
error E (right) for a fixed ratio C/K = 0.125. 



Regarding to the quality of the encoding, we check whether the fixed points of 
LBP correctly represent the component of the probability mixture, by guiding 
LBP at the beginning of the iterations. In a second step, we run LBP without 
guiding, and measure the probability to converge to a spurious fixed point and 
the diversity of these fixed points. We observe that there is a specific ratio rj* 
of 7/ = C/K, below which it is always possible to find a value of a such that a 
fixed point is associated to each encoded state and no other spurious fixed point 
is present. In that case, as a varies, 3 different regimes are to be found: when 
a is too small, only one fixed point is present, in the intermediate range of a 
of interest all fixed points correspond to the encoded states, and for larger a, a 
proliferation of fixed points occurs, while the ones corresponding to the encoded 
states are destabilized. This will be analyzed in Section 0J 

The second point is the efficiency and reliability of the decoding procedure. 
The question is to measure how well LBP performs (in term of R and E defined 
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Figure 3: Influence of the pruning on the inference success rate R (left) and on 
the belief error E (right) at given a. 



in previous section) when the proportion of known variables increases. Figure [2] 
shows, for several values of a, the evolution of R and E as the proportion p of 
revealed variables increases. This is compared with the ideal reconstruction rate 
R(°\ which would be obtained from the underlying mixture model. Typically, 
for the optimal value of a, knowing 10% of the variables is sufficient to reach 
the optimal inference rate (see left plot). When looking at the mean absolute 
value error on the beliefs E, an error of less than 0.1 is generally achieved with 
this optimal choice of a (see right plot). The effect of the pruning procedure is 
shown in Figure [3l The performance deteriorates smoothly, when the parameter 
v decreases. 



4 Mean-Field analysis 

4.1 Connection with the Hopfield model for large C 

The connection between the LBP algorithm and statistical physics has been 
recognized recently. It has been established that the LBP fixed points correspond 
to local minima of the Bethe free Energy [5] , and that the LBP scheme is actually 
providing solutions to the mean field tap equations [17]. Let us consider the 
asymptotic situation corresponding to having both C and K are large. Using 
spin variables of statistical physics s$ = 2xi — 1, the measure (fl3| may be cast 
in the standard form of the disordered Ising model 

P( 8 ) = 

Z 

with [3 the inverse temperature (which is arbitrary for the moment) and the 
definition 

H[s] = - - JijSiSj - Y 
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The identification with the marginals gives: 

Ft T - a u a P«j(M)P«j(0»0) 
^'- 4 i0g ^(0,1)^(1,0)' 

1 - aKj pj(l) ^(1,1)^(1,0) 

Phi = log MO) + 1 § log ^(0,1)^(0,0) ' 



with 



»to = ^£(l + (2T-l)(2p?-l)), 



2C 

C— 1 

P«(t,t') d = f ^ 5^(1 + (2r - l)(2p? - 1)) (1 + (2r' - l)(2^ c - 1)). 

C— 1 

for r and r' in {0, 1}. Let 

r=^p & = ^X> fe = ^£^-^- (15) 

v c— 1 c— 1 

For large C, we have, in distribution 

lim ~JV(0,1), lim ~ 7V(0, 1). (16) 

C — >oo C — >oo 

where jV"(0, 1) denotes a normal variable with unit variance. Using this notation, 
and assuming C > 1, we have 

p JtJ =4a^ + 0(^) (17) 



/3ft, = 2^ - 8a« 3 / 2 5] fcfo + A ' (^72 ) ■ ( 18 ) 
for fixed connectivity if. Note that, in addition to (fT6|) . we have 

r lim -7=E^~^( 'i)' 



and that the two terms present in hi are uncorrelated at first order (the covari- 
ance between £j and is zero). In this form, the Hamiltonian is similar to 
the one governing the dynamics of the Hopfield neural network model [l8l [19] . 
Considering the canonical form of the Hamiltonian chosen in [20] , 
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adapted to a non-complete graph, the inverse temperature then reads 



AavK 

and 



C 2C^ v- t 



1 2aK«/v K 

The coefficients u^^ are the components of the Perron vector normalized to %/iV 

(so that ttj = 0(1)), associated to the largest eigenvalue K of the incidence 
matrisQ. When the graph has some permutation symmetry with a uniform 
connectivity, in reduces to 1 and K to this connectivity. K is considered 
from now on as an extensive parameter. 



4.2 Phase diagram 

The mean- field theory of the Hopficld model has been solved by Amit, Gutfrc- 
und and Sompolinsky in [20] using replica's techniques, results which were soon 
confirmed with help of the cavity method [19] . and put later on even firmer 
mathematical grounds in [21j . In this section we can simply read off some prop- 
erties of our model from this mean field theory. The order parameter introduced 
by [20] is 

1 N 

A*c = ^ £ E s ^ K ktsi) , Vc = 1, . . . , C. (19) 

i=l 

where the expectation comprises both thermal averages and expectation with 
respect to the quenched disorder variables ££. This quantity measures the cor- 
relation between the spin bias in each components with the local magnetization. 
The projection on an arbitrary Perron vector has been taken into account for 
sake of generality. Two cases are at stake in the thermodynamic limits. 



(i) C is large but fixed when N — > oo. In that case, considering that 

An 

13 = — lim a(N)K(N), 

is a fixed parameter in the thermodynamic limit, then the mean-field free energy 
per variable directly adapted from [20] reads, 

f iN) in, H = § E & - 1 E lo s [ 2 cosh (/? E «i w - - 2^0] , 

c i c 

1 Here we keep track of the fact that we possibly deal with a non-complete graph with 
arbitrary topology given by some incidence matrix A: to each edge (ij) preserved by the 
pruning procedure is associated the element a^j = 1, while other elements are set to 0. Under 
the hypothesis that the second eigenvalue is sub-dominant w.r.t. K (it is generally the case 
when for example the connectivity is extensive with the size of the system), only the Perron 
eigenvector is to be considered in the mean field theory. 
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where subdominant terms in the 1/C expansion are implicitly neglected. The 
stable thermodynamical states are then obtained by solving the saddle point 
equation, which reads 

1 N 

i—1 c 

i N 

= # X>< & 9 - & tanh(/3^^K (K Vc - A)), Vc = 1, . . . , C. 

i—1 c 

The last line is obtained after using that from the first equation fx is transverse 
and after defining 

_ d rf 2^ 
C0- 

These equations are very similar to the one obtained in [52] and so are their 
solutions. For (3 > f3 c — 1, 2C thermodynamically stable states, referred to 
as Marris-states in [22], appear. Each one of these states is macroscopically 
correlated or anti-correlated to one of the mixture component, i.e. a single com- 
ponent fi c acquires a finite value. They are the only stable states up to some 
threshold value of /3, where mixed stable states do appear. 



(ii) The number of components is extensive: C = r\K. In that case, the 
terms corresponding to the local field hi becomes irrelevant: their contribution 
to the energy per variable is then 0(1 /N). Hence the mean field limit is directly 
described by the Hopfield model at inverse temperature 

V 

Let us simply describe the phase diagram (T, rf) (see Figure 01 obtained in [5U] 
for binary ^ £ { — 1, 1}. When C is macroscopic, the mixture acts in part as a 
decorrelated random noise on the Jy, so that a spin glass phase, characterized 
by the Edwards- Anderson order parameter 

1 N 

S = ^£%(E s ( Si |{£}) 2 ), 



may develop and compete with the pure states encountered at finite C. Except 
for a finite number of components c = 1, . . . , s, with which a finite overlap may 
persist in the thermodynamic limit, the order parameter fi c is otherwise of order 
0(1/VN) for c>s and 



C>S 



N N 
i=l i=l 
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which represents the mean square of the global overlap with these components, 
also introduced in [20] may acquire a finite value. In presence of an external 
field /i° xt — h.^i correlated with the patterns, the mean-field equations of Amit, 
Gutfreund and Sompolinsky read 



A* = %* 



£tanh(/3(v^z + |\(0+K)))] , (20) 
tanh 2 (/3(v^ + e- (#+£))) , (21) 



r = q/(l-(3 + 0q) 2 , (22) 

where z ~ A/"(0, 1) and where £ and h are s-components vectors, if one assume 
the ground state to be a state correlated to s components of the mixture. For 
h = 0, the phase diagram contains three phases, depending on the value of 
T = 1/13: 

• the paramagnetic phase for T > T g , 

• the spin glass phase for T c < T < T g , 

• the ferromagnetic phase for T < T C) with spin configurations correlated 
with one of the mixture component (Mattis states). 

These are separated by two phase transition lines T g {rj) (second order) and 
T c (rf) (first order). An additional line Tm(t]) corresponds to the apparition of 
the Mattis states as metastable states for T c < T < Tm before they become 
ground states for T < T c . 

Coming back to our inverse problem of finding the most accurate model for 
inferring the underlying mixture distribution, the parameter a allows us to tune 
f3 to the most adequate value. For this simplified formulation (£? £ {—1, 1}), 
from the definition (Til?]) of the order parameter and the definition (|T5|) of we 
see that the requirement is basically to tune f3 such that the global optimum 
corresponds to Mattis states with overlap 

fi = 2y^. (23) 



4.3 Mean-field decimation curves 

When the decimation procedure, described in Section 13. 11 is performed, the 
various indicators R(p), E(p) or -Dkl(p) taken as functions of p give us a set of 
decimation curves, which we want to analyse in the mean-field regime. When 
some variables are observed, the mean-field equations describing the statistical 
behaviour of the hidden variables are simply obtained by adding to their local 
field the field exerted by the observed variables. Let p be the fraction of observed 
variables, and {s*,i — 1, . . . , pN} the corresponding set. These variables are 
correlated to one of the underlying component mixture, which we choose to 
be c = 1 by convention. The reduced system consists then of the M — (1 — 
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p)N hidden variables, {si,i = 1...M}. To simplify the discussion, we also 
assume that the connectivity in this set is reduced in the same proportion to 
(1 — p)K, which is effectively the case on a complete graph. The external local 
field experienced by any hidden variable i now reads 

hr(p) = h i +j2j»s* 

- 2 -f& + f (EM - ^E^)+ Jf °(^i). 

jei j'€i 

with Jij and hi given (|17j) and (|18|) . In the thermodynamic limit with C = riK, 
a relevant term survives in hi(p) because of the correlations of the s* with one 
of the mixture components (the first one by convention) , 



77 

As a result, keeping only the relevant term yields 

hT t (p) = 2p^}+0{^=). 

For p — 1: the single variable marginals (called the beliefs) are directly ob- 
tained from h cxt in this limit. To evaluate the prediction error, we have then 
simply to compare 

Pi( s i = s ) = 2 + s V"£i- 
with the corresponding limit belief, 

Pl ( Sl = S ) = 1(1 + stanh( / 3/i| xt (l))). (24) 

After some algebra, we find (for C> 1 and when the £ G { — 1, 1} are binary) 
the following expression of the Z?kl error, 



D KL(!)i ,p;) = (i + ^.ogi±0 + (I-v^log^f + O(-L), (25) 

with 2y / ?}j = tanh(/3y / i7), so that the error vanishes when 

2sfi = tanh(/3v^). 

For intermediate values of p : the mean field equations are still valid after 
replacing f3 by (1 — p)/3, 77 by 77/(1 — p). The belief may be parametrized as 
in (f!M]) by a local field, which statistical ensemble is now represented by the 
following stochastic variable 



h(p) = h^(p) + ^(1 - />)// + ^(1 - tifrjz, 



= ^((1 - + 2pvS + v /(l-p)r77z, 
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where £ has variance 1, z ~ A/"(0, 1), and r is such that Ej :Z [tanh 2 (/3ft,i)] = q. 
The mean Kullback-Leibler distance with the reference belief p then reads, 



D 



KL 



(p,p) = E^ z (f3(h - h) tanh(/3/i) + log- 



cosh (3h 
cosh (3h I ' 



(3p[(l - P)M + 2pV^] + P 2 rv(l - p)(l - <Z) 



+ E. 



log 



l-tanh 2 (/3/i) 
1 - 4< 2 



atanh(2£t>) tanh(/3/i) 



(26) 



For binary variables £ € {— 1, 1}, we recover (|2l5"j) when p = 1 with ^ = 2y / w. In 
this special case it is in fact tempting to tune a such that the requirement (|23[) 
is fulfilled for any p. Tuning the function a(p) amounts to find (3 such that 



2V^ = E 2 
q = E z 



tanh^/3(v(I pj^rz + 2^/5) 
tanh 2 (/3(v/(l - p)r/rz + 2^/u) 



(l_/3(l_p)(l-g))' 



altogether with equation (|22p . when y 7 ?; and 77 are fixed parameters. Instead, 
when £ is continuously distributed, the resulting Z?kl error is then a superpo- 
sition of elementary distances, and has a strictly positive lower bound. 



4.4 Comparison with experimental results 

The numerical results presented in Figures 0]-[7| are obtained by running lpb 
on the experimental setting explained in Section 13. 1[ performed with a fixed 
intermediate value of v — 0.15, along with the inference model presented in 
Section O and O 

Consider first what is expected to happen, for small enough value of C/N, 
when correlated states are searched with the help of a guiding field (see Sec- 
tion l3.1|) . while T is decreased along a vertical line on the phase diagram (see top 
left of Figure |4|) : the spin-glass transition line T g is first encountered, material- 
ized by a sudden increase of r and q as well as Dkl (see top right Figure 2]). The 
small amount of information contained in the paramagnetic phase get simply 
screened by the proliferation of spurious states, none of them being correlated 
with the Mattis states (p, = 0). Then the line Tm is passed through, correlated 
states appears, which are expected to be detected by the guiding field, so that 
p acquire a non-zero value, while r decreases. In practice, as seen from the top 
left Figure 03 the spin glass phase renders the guiding field ineffective when N 
increases. The pruning procedure cure partially this problem, but a trade-off 
has to be found, as can be see from the bottom right Figured] the density of 
spurious states decreases when the pruning increases, but phase transition lines 
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Figure 4: Top left: phase diagram of the Hopfield model for h cxt = 0. Points 
represents results of optimal solutions obtained by CMAES for various size N. 
Top right: Order parameters as a function of (3 if correlated states are correctly 
detected by the guiding field. Global (bottom left) and partial (bottom right) 
fitness values of these solutions. 



get shifted in a way that allows only highly polarized states to be present; as a 
result, the lower bound of Dkl increases. Intermediate pruning threshold have 
been actually found by the optimization procedure (see next section) and the 
phase diagram remains approximately valid, as seen by looking at the top right 
and bottom left of Figure We observe that the solutions remain close to the 
Tm line in Figure @] Concerning the decimation plots (Figure O, comparison 
with the mean-field limit differs at low density p because of finite size effects 
(top) and because of the spin-glass phase (bottom) , which prevents the LBP to 
converge faithfully to the ground states. The saturation phenomena of the dec- 
imation curves, which occurs when p tends to 1, is reproduced correctly by the 
mean-field analysis. One would expect the -Dkl error to vanish as the number 
of observed variables increases, but, as indicated by (PS)) , we have a superposi- 
tion of Dkl errors, due to the dispersion in the polarization of variables, which 
by definition cannot be made arbitrarily small. Still, Figure [6] is an instance 

2 The true phase diagram after pruning is actually unknown to us, because the links are 
not chosen randomly. N seems to be more appropriate than K to define the temperature for 
intermediate values of the pruning (e.g. K/N = 0.3) 
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v=0.15 C/N = 0.04 K/N=l v=0.15 C/N=0.02 K/N=0.3 




Figure 5: The Kullback-Leibler error as a function of j3 obtained experimentally 
with an evanescent guiding field and their corresponding mean-field expectation 
(|26|) . The top left plot shows the limitation due to the spin-glass phase. Effects 
of the pruning procedure is shown on the other plots. 



where an efficient prediction is obtained with less than five percent of observed 
variables, which could be is useful for real applications. 

5 Continuous parameter optimization 

The definition (|13[) sets up a single parameter model which, combined with 
the pruning procedure, is in fact a two parameter model lo = (a, r) where 
r £ [0, 1] is the fractions of edges which are conserved. The model could be 
straightforwardly extended by associating a coefficient a a to each factor node 
a. The determination of the set {a a , & G J~} for optimizing the model, would 
lead to a difficult continuous and combinatorial optimization problem. Instead, 
assuming we have at hand a meaningful criteria to sort the factor nodes, we 
may divide the distribution in a certain number of parts q, delimited by a an 
increasing set of quantiles {r^, i — 0, . . . , q\, with ro = and r q < 1, each part 
associated to a parameter aj. As a result, given the number of parts q, we 
have a 2q parameter model, uj^ = (a\, . . . , a q , r\, . . . , r q ), which is well suited 
to continuous optimization, if q is not too large (typically less than 100). This 
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N=1000 C=20 v=0.15 alpha=0.158 N=1000 C=20 v=0.15 alpha=0.158 




0.2 0,4 0,6 0.8 1 '"0 0,2 0,4 0,6 0,8 

P P 

beta=1.25 cta=0.04 v=0.15 



« N=10Q K=99 
— ■ N=200 K=199 
N=300 K=299 




P 

Figure 6: Bottom: Experimental decimation curves of -Dkl at fixed (3 = 1.25 
and C/N = 0.04 for complete graphs, compared with their expected mean-field 
limit. Top: Effect of pruning on the Z?kl error (left) and on the prediction error 
(right) for C/N = 0.02 and (3 = 4.74. 

requires the definition of a fitness function. We have conducted this program 
on the pairwise model. The natural fitness function for this problem is obtained 
from the decimation procedure explained in Section l3.il 

F(w (,) )k / dp(l-p)D Kh {p), 
Jo 

where p is the fraction of observed variables. This fitness function is however 
quite costly, so we use a surrogate fitness function based on the identifications 
of the fixed points: 

c=l 

where D^(0) represents the Kullback-Leibler marginal distance of a driven 
fixed point (with help of the evanescent guiding field introduced in Section [3. ip 
to the corresponding mixture component c when all variable are hidden. This 
surrogate fitness appears to be much less noisy and costly than the original one, 
but still well correlated to it as can be seen in Figure [7] (right). One can get an 



19 




Figure 7: Optimization results for a problem with TV = 100 variables and C = 5 
components, when q is increased. The optimization is performed with the upper 
quantile r q bounded to 0.5. The average single variable -Dkl error as a function 
of p, the fraction of observed variables (left). Correlation between the global 
and partial fitness (right). 



idea of the ruggedness of the fitness landscape by simply looking at Figure [5l As 
a consequence we used a stochastic optimization algorithm, usually well suited 
choice for rugged fitness landscapes. The optimizer chosen is the Covariance- 
Matrix- Adaptation Evolution-Strategies (CMA-ES) [23], where a population of 
candidate solutions are sampled according to a multivariate normal distribution, 
whose parameters (mean value and covariance matrix) are adapted according to 
the feedback gathered along the optimization procedure. The underlying idea 
for the adaptation mechanism is to increase the probability of sampling better 
solutions. In the end of the search procedure, the sampling distribution gives 
an estimate of the local curvature of the objective function. 

We have compared different ways of sorting the edges based on the set of 
coupling Jij (see preceding section), which somehow figure the amount of in- 
formation transmitted from one variable node to another one. Based on the 
electric network analogy (see e.g. 24 ), we consider the following different sort- 
ing criteria: 

• simple sorting, 

• absolute conductance sorting, 

• relative conductance sorting. 

We expect these to capture different properties of the underlying factor graph. 
The simple sort is based on the value of \ Jij\ for each edge € E. The ab- 
solute conductance sort amounts to reweight these couplings Jy by the fraction 
of weighted spanning tree (WST) containing the edge while the relative 

conductance sorting take into account this fraction solely (the spanning trees 
are weighted with these |Jy|). Deceptively, the simple sorting procedure yields 
the better results. So if there exists a smarter way of sorting the links, we might 
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find it hopefully by analyzing the mean field equation on a pruned graph, which 
are not established yet. Anyway, the example shown on Figure [7] indicates that 
the optimization works when using this simple sorting procedure. In this exam- 
ple, the global error is decreased by 40% with a 13 quantiles parameters model 
with respect to the single parameter model (Figure [71 right). In addition, the 
improvements occur in the region of interest, that is when p < 0.2 (Figure [7J 
left). 



6 Comparison with other approaches and per- 
spectives 

The model we propose shares some common points with the tree-reweighted 
belief propagation algorithm described in [T3] and with the fractional belief 
propagation scheme [16] . The Bethe approximation ^ is a particular case 
of a general set of variational region based free energy approximations [25] . 
Introducing for each variable and factor node the energies and entropies, 

E i = ~ b i ( X i ) lo S <t>i( x i) E a = ~ 2j ba ( Xq ) lo S ^ a ( Xq ) ' 

Hl = - ^2 b i(xi)logbi(xi) H a = -^& a (x Q )logfo Q (x Q ), 

and considering only the region associated to the factors, a general approxi- 
mation is obtained by introducing different counting numbers for the average 
energy and entropy, 

F(b) = 5> a £ Q - h a H a ) + 5>,£; 4 - hiHi) (27) 

a i 

The coefficients corresponding to the fractional belief propagation approach 
of [M] are 

e a = 1 e» = 1 hj = 1 - h a , 

where the h a are arbitrary real coefficients. 

Concerning the tree reweighted free energy of [13j . which is defined for a 
pairwise factor graph, as noted in the coefficients read 



1 ei = 1 hj = 1 - hij, 



where hij £ [0, 1] represents the probability that edge appears in a spanning 
tree of Q, chosen randomly under some given measure on the set of spanning 
trees. It is too a sub-case of fractional belief propagation. 

Our choice instead amounts to consider the parametrization 

e, = 1 hi = 1 — <U h-=l, 
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while e a are arbitrary positive coefficients, noted a,j , with the convention (|13p 
for (j) and ip. 

It is however not this slight modification of the search space of approximate 
variational free energy that characterizes our approach, but rather the varia- 
tional framework. In our case, we purposefully choose a non convex framework, 
because we want to allow many belief-propagation fixed points to be present. 
Conversely, [16j and [13] strive at finding a convex variational free energy ap- 
proximation. Further work is needed, possibly by extending the search to the 
full variational space corresponding to the set of coefficients (e a ,ei,h a ,hi), to 
see which type of parametrization is best adapted to our problem. Let us simply 
note for the moment that counting coefficients h a ^ 1 and hi ^ 1 — di yield some 
feed-back in the definition of the messages (see Appendix), which is precisely 
what this message passing procedure is supposed to avoid for obtaining fast 
convergence. Nevertheless, it would be interesting to see whether the measure 
on weighted spanning trees deduced from the strength of the coupling constants 
may be used to define a well suited tree reweighted approximation. 

The main observation of this work, namely that a mixture of well separated 
probabilistic states may be efficiently encoded and decoded in a multiple set 
of LBP fixed points, deserves further developments, both from the practical 
and theoretical point of view. The analysis of the mean field theory could be 
extended to understand better how graph pruning affects the equations. More 
generally, understanding better the influence of the graph structure on the mean 
field equation could yield as a byproduct an optimal way of sorting the edges 
for the optimization procedure. Further work is also needed regarding the effect 
of the factor graph on the storage capacity, when not restricting ourselves, as 
in the present study, to a pairwise factor graph. While trying to optimize the 
number of probabilistic patterns that may be encoded, we have at the same time 
to restrain the connectivity of the graph, so that the advantage of using a fast 
message procedure is preserved: a proper trade off has to be found. In addition, 
the connection with the Hopfield model helps us to assess the limitation due to 
spin glass effects, and developments in the field of neural networks should help 
us to limit this drawback. 

Acknowledgments This work was supported by the French National Re- 
search Agency (ANR) grant N° ANR-08-SYSC-017. 

References 

[1] C. Furtlehner, J.-M. Lasgouttes, and A. de La Fortelle. A belief propagation 
approach to traffic prediction using probe vehicles. In Proc. IEEE 10th Int. Conf. 
Intel. Trans. Sys., pages 1022-1027, 2007. 

[2] T. Heskes, M. Opper, W. Wiegerinck, O. Winther, and O. Zoeter. Approximate 
inference techniques with expectation constraints. J. Stat. Mech., page P11015, 
2005. 



22 



[3] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley- 
Interscience, 2th edition, 2006. 

[4] F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum- 
product algorithm. IEEE Trans, on Inf. Th., 47(2):498-519, 2001. 

[5] Max Welling and Yee Whye Teh. Approximate inference in Boltzmann machines. 
Artif. Intell, 143(1): 19-50, 2003. 

[6] H. A. Bethe. Statistical theory of superlattices. Proc. Roy. Soc. London A, 
150(871):552-575, 1935. 

[7] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Network of Plausible 
Inference. Morgan Kaufmann, 1988. 

[8] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalized belief propagation. 
Advances in Neural Information Processing Systems, pages 689-695, 2001. 

[9] T. Heskes. Stable fixed points of loopy belief propagation are minima of the Bethe 
free energy. Advances in Neural Information Processing Systems, 15, 2003. 

[10] Sekhar Tatikonda and Michael Jordan. Loopy belief propagation and Gibbs mea- 
sures. In Proc. of the 18th An. Conf. on Uncertainty in Art. Intel. (UAI-02), 
pages 493-50, 2002. 

[11] J. M. Mooij and H. J. Kappen. Sufficient conditions for convergence of the sum- 
product algorithm. IEEE Trans, on Inf. Th., 53(12):4422-4437, 2007. 

[12] A. T. Ihler, J. W. Fischer III, and A. S. Willsky. Loopy belief propagation: 
convergence and effects of message errors. J. Mach. Learn. Res., 6:905-936, 2005. 

[13] M. J. Wainwright. Estimating the "wrong" graphical model: benefits in the 
computation-limited setting. JMLR, 7:1829-1859, 2006. 

[14] M. J. Wainwright. Stochastic processes on graphs with cycles: geometric and 
variational approaches. PhD thesis, MIT, January 2002. 

[15] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky. Tree-reweighted belief prop- 
agation algorithms and approximate ML estimation by pseudomoment matching. 
Workshop on Artificial Intelligence and Statistics, 2003. 

[16] Wim Wiegerinck and Tom Heskes. Fractional belief propagation. In Advances in 
Neural Information Processing Systems 15, pages 438-445, 2003. 

[17] Y. Kabashima and D. Saad. Belief propagation vs. TAP for decoding corrupted 
messages. Europhys. Lett., 44:668, 1998. 

[18] J. J. Hopfield. Neural network and physical systems with emergent collective 
computational abilities. Proc. of Natl. Acad. Set. USA, 79:2554-2558, 1982. 

[19] M. Mezard, G. Parisi, and M. A. Virasoro. Spin Glass Theory and Beyond. World 
Scientific, Singapore, 1987. 

[20] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Statistical mechanics of neural 
networks near saturation. Annals of Physics, 173(l):30-67, 1987. 

[21] M. Talagrand. Rigorous results for the hopfield model with many patterns. 
Probab. Th. Relat. Fields, 110:177-276, 1998. 

[22] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural 
networks. Phys. Rev. A, 32:1007-1018, 1985. 



23 



[23] Nikolaus Hansen and Andreas Ostermeier. Completely derandomized self- 
adaptation in evolution strategies. Evolutionary Computation, 9(2):159-195, 
2001. 

[24] G. Grimmet. Discrete spatial and physical processes in probability, 2008. Lecture 
course at the IHP, Paris. 

[25] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approx- 
imations and generalized belief propagation algorithms. IEEE Trans. Inform. 
Theory., 51(7):2282-2312, 2005. 

[26] Y. Weiss, C. Yanover, and T. Meltzer. MAP estimation, linear programming and 
belief propagation with convex free energies. In Proc. of the 23th An. Conf. on 
Uncertainty in Art. Intel. (UAI-07), 2007. 



24 



A Appendix: Generalizations to belief propaga- 
tion algorithm 

We adapt here the reasoning of to the free energy of Section [5] The function 
that has to be studied to minimize the generalized Bethe free energy (f2"T)) reads 

^(b) = E Mx a ) log - e h( Xi ) 01 ( ' r 



E &„(*»)) -5><d>(*0 - 1)- (28) 



with {A a i} a set of Lagrange multipliers attached to each link, to insure compat- 
ibility conditions between joint beliefs and single beliefs, and {ji} a set destined 
to enforce single beliefs normalization. The stationary points read 

& a (x Q ) = ljj a (x a ) e ^ h « exp(^ J2 l£ a Ki(Xi) - l) , 
bi(Xi) = (t>i{xi) e ^ h * exp(j-( 7l -Y.a3t X ai{Xi)) - l) . 

At this stationary point, the generalized Bethe free energy reads 

F(b) = - E b a (x a )[h a - ^ A j(Xj)j - E^(^) ^» + E ^oi(Zi) - 7i 
a,x a i£a a3i 

^E^-E^-E^- 

i a i 

and one can write 

n^(x a ) e - Y[M^r = T[b a (x a ) h « Y[b t (x t ) h ^-^ b \ 

a i a i 

The compatibility constraint between the single variable beliefs bi and factor 
beliefs b a yields for i € a 



E^(x a ) e »^n n ^^) 1 



/h a 



j£a 



with the usual definition, although slightly different from Q, 



(29) 



(30) 



A simple way of getting a mapping suitable for an iterative algorithm is to 
isolate the term ni_, a (xi) to the left of the equation 



l/h a 



j£a\i 



U>i(Xi) e ' Y[ Tk->a'(Bi) 



1/hi 



a' 
a 
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This relation yields a new message passing algorithm that would be a close 
cousin of the lbp algorithm; the properties of this new algorithm have not been 
investigated yet. 

In order to obtain something that is closer to the original algorithm, we 
define a new set {m} of messages by the relation 

and rewrite (|2U)) as 

m a ^i(xi) oc ^^ a (x a ) e " JJ n,j^ a (Xj) ' x (f) i (x l )~ ei/h \ (31) 

x a \, j£a\i 

This relation will produce a LBP-like algorithm if we invert the definition of 
{m}. To this end, we write the identity 

^2h a , \og(m a/ ^i(x t )) = 1 + , h3% 6 log(^->a'(^i))) 

- - hi 

a'3% a 9i 

from which the following relation can be obtained 

log(n.^ Q (a;j)) = -h a log(m Q ^ 4 (xi)) + ^ — ^ h a > \og{m a > ^i{xij) . 

(32) 



Equations (|3Tj) " (|32|) yield the updates rules in this generalized setting. In 
the case of fractional belief propagation, (|32[) reduces to 



log(n i _ a (a; i )) = -h a log(m a ^ 1 (j: 1 )) + h a^2 K> log(m a ^j(xi)) 

The ordinary LBP scheme corresponds to h a = 1 and hi — 1 — di. Note that, con- 
trary to the fractional belief propagation algorithm and to the tree-reweighted 
algorithm, there is no feedback term apparent in the r.h.s. of (|31|) . This property 
ensures the independence of the messages in absence of loops. 

However, the definition in (f3"T| contains a feedback at second order since 
m a —>i depends of m a ^j for j ^ i, which themselves have been computed from 
the former value of m a -+i ■ This can be avoided only when 

-hjl _*•)=(), 

V hi + 2^b3i h b / 

that is, h a = h and hi = (1 — di)h for some value of h. This setting is equivalent 
to normal LBP. 
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